Skip to content

feat(opentelemetry source)!: Improve HTTP error signalling #22957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fbs
Copy link

@fbs fbs commented Apr 28, 2025

Summary

This PR actually does two things:

  1. It migrates from warp::rejection to replying directly. A rejection is meant to signal that this filter failed but that another filter can try. In this case falling back doesn't make sense, if the route is matched there will be no other handler for it. Falling back is just extra work.
  2. It improves the status codes vector sends. Instead of a 500 ISE we can use more over the HTTP spec. A 404 from Vector because you forgot to use /v1/logs is easier to debug than a 500.

Another benefit from 1. is that it makes it easier to add JSON support in the future, we just need to pass the content-type header along. With rejects thats hard as the reject handler doesn't have access to headers. This will be done in a follow up.

This PR is half of #22875, making it easier to review.

I'm not sure if the HTTP status code behaviour is considered breaking, I assume it is so I marked it as such

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

Both tests and using it

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • The CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
      • ./scripts/check_changelog_fragments.sh
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run cargo vdev build licenses to regenerate the license inventory and commit the changes (if any). More details here.

References

#22940

fbs added 3 commits April 28, 2025 23:56
This will allow us to support json in a later stage, as the content type
of the reply must match what the client used in the request.

This also improves error messages a bit, as its now a 415 instead of a
500. A 500 internal server error is quite confusing when debugging.
There is no error with the server, its the client sending the wrong
request.
This tests the expected behaviour of the Open Telemetry source wrt error
handling. Currently vector mostly sends 500 Internal Server errors,
which isnt that helpful when debugging. It also means some retryable
errors are not retried.

The source implementation is done in the next step, so all tests fail.
Reject and recover indicates that the current filter didn't match but
that others might[1]. This means reject is not the appropriate tool for
handling e.g. deserialization errors.

This commit migrates from rejection to returning the reply directly, and
at the same time updates it to the new status codes to make sure it
passes the new tests.

This change also makes it possible to support `application/json` in the
future. The `content-type` of the reply must match that of the request,
and as we now have access to the request headers we can make sure they
match.

[1]: seanmonstar/warp#388 (comment)
@fbs fbs requested a review from a team as a code owner April 28, 2025 22:11
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Apr 28, 2025
}

#[tokio::test]
async fn delivery_error() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use some help fixing this test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this failing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its failing to fail.

I want vector to fail delivery and end up in the BatchStatus::Errored code path, but it doesn't seem to

@pront pront self-assigned this Apr 29, 2025
@pront
Copy link
Member

pront commented Apr 29, 2025

I'm not sure if the HTTP status code behaviour is considered breaking, I assume it is so I marked it as such

That's true, we will a changelog explaining why this is a desired change.

@fbs
Copy link
Author

fbs commented May 20, 2025

@pront gentle ping. What would be the best way to move this forward?

@pront
Copy link
Member

pront commented May 21, 2025

@pront gentle ping. What would be the best way to move this forward?

We will need a changelog describing the breaking change (before & after). Otherwise this looks good.

message: err_msg.message().into(),
..Default::default()
});
let mut events = events.unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid unwrap()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Is there a nice syntax trick? The if let Err() above should guarantee this unwrap is safe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, you can use is_err() and return early. Or:

let mut events = match events {
    Ok(events) => ...,
    Err(err) => { ... }
}


let mut resp = reply.into_response();
resp.headers_mut()
.insert(http::header::CONTENT_TYPE, "text/plain".parse().unwrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid unwrap

}

#[tokio::test]
async fn delivery_error() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this failing?

Some(receiver) => match receiver.await {
BatchStatus::Delivered => Ok(protobuf(resp).into_response()),
BatchStatus::Errored => reply_with_status(
hyper::StatusCode::INTERNAL_SERVER_ERROR,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to tell why it errored and whether it makes sense to return a 'retriable' status code (e.g. 502)

@fbs
Copy link
Author

fbs commented Jun 2, 2025

Ill pick this up again next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sources Anything related to the Vector's sources
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants